The Infinite Mixture of Infinite Gaussian Mixtures for Clustering Data Sets with Multi-mode and Rare Clusters

نویسندگان

  • Halid Z. Yerebakan
  • Bartek Rajwa
  • Bethany Ehlmann
  • Murat Dundar
چکیده

Motivated by clustering of data sets with multimode and skewed cluster distributions, we introduce a two-layer Bayesian Gaussian mixture model that is non-parametric in terms of not only the number of clusters but their shapes as well. The upper layer in this model uses a global Dirichlet Process (DP) to model the number of clusters and their sizes, while the lower layer assigns one local DP for every cluster generated by the upper layer. Dependency between local DPs and global DPs is established by defining local DPs using parameters generated by the global DP. We validate the proposed model using clustering problems involving rare clusters with relevant examples from the fields of hyperspectral imaging and flow cytometry. We demonstrate that our model is superior to benchmark clustering techniques in identifying rare clusters while remaining highly competitive in modeling multi-mode/skewed clusters.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

The Infinite Mixture of Infinite Gaussian Mixtures

Dirichlet process mixture of Gaussians (DPMG) has been used in the literature for clustering and density estimation problems. However, many real-world data exhibit cluster distributions that cannot be captured by a single Gaussian. Modeling such data sets by DPMG creates several extraneous clusters even when clusters are relatively well-defined. Herein, we present the infinite mixture of infini...

متن کامل

The Infinite Mixture of Infinite Gaussian Mixtures

Dirichlet process mixture of Gaussians (DPMG) has been used in the literature for clustering and density estimation problems. However, many real-world data exhibit cluster distributions that cannot be captured by a single Gaussian. Modeling such data sets by DPMG creates several extraneous clusters even when clusters are relatively well-defined. Herein, we present the infinite mixture of infini...

متن کامل

Unsupervised Classification of Functions using Dirichlet Process Mixtures of Gaussian Processes

This technical report presents a novel algorithm for unsupervised clustering of functions. It proceeds by developing the theory of unsupervised classification in mixtures from the familiar mixture of Gaussian distributions, to the infinite mixture of Gaussian processes. At each stage a both a theoretical and an algorithmic exposition are presented. We consider unsupervised classification (or cl...

متن کامل

Clustering Protein Sequence and Structure Space with Infinite Gaussian Mixture Models

We describe a novel approach to the problem of automatically clustering protein sequences and discovering protein families, subfamilies etc., based on the theory of infinite Gaussian mixtures models. This method allows the data itself to dictate how many mixture components are required to model it, and provides a measure of the probability that two proteins belong to the same cluster. We illust...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2015